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review the background of the cluster algorithms 
in OTrjnte Carlo simulation of statistical physics problems. 
On#-<tf the first such successful algorithm was developed 
byjffirendsen and Wang eight years ago. In contrast to 
thajacal algorithms, cluster algorithms update dynami- 
cal^kriables in a global fashion. Therefore, large changes 
arq-a^ade in a single step. The method is very efficient, 
especially near the critical point of a second-order phase 
transition. Studies of various statistical mechanics models 
ancfeome generalizations of the algorithm will be briefly 
reviewed. We mention applications in other fields, espe- 
cially in imaging processing. 
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I. MONTE CARLO METHOD 



s 

Ope of the important task in statistical mechanics 
antUiuantum field theory is to compute the statistical 
avefige, 
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A(X)P(X), (1) 



whi§Ee the summation is over a set of all states X, 
which usually is very large. A general method was 
pmoosed long time ago in the fifties by Metropolis et 
aflcl, well-known as Metropolis algorithm. Instead of 
summing over all the states as required, one generates 
a selection of the states according to the probability 
P(X), so that the answer is given approximately by an 
arithmetic average of the quantity A(X). The gener- 
ation of a sequence of the states is done stochastically 
by the method of Markov chain. 



A. Markov chain 

A Markov chain! is a mathematical notion for a ran- 
dom walk in the state space X. We start from some 
initial point Xq 7 the next point X\ is generated ac- 
cording to certain transition probability. In this way 
a sequence of random points Xq, X±, • • ■ , Ajv is gen- 
erated. These points are required to appear with the 



probability P(X). The rule to generate next point 
given the point Xi is specified by the transition 



probability 



P{X i+l \X i ) = W{X i ^X i+l ). 



(2) 



where P(Xi+\\Xi) is the probability that the system 
is in state .Xj+i, given that the system was in state 
Xi, namely, it is the conditional probability. We have 

W(Xi -> X l+1 ) > o, Yl w ( x * = L ( 3 ) 

Since W is probability, it satisfies the usual constraint 
of probability — probability must be positive and total 
probability adds to one. 
If we choose W so that 



P(X) = J2P(X')W(X' -> X) 



(4) 
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we say that P(X) is invariant under the transition 
of W, or P(X) is the equilibrium distribution for the 
given transition probability. Such a probability distri- 
bution will be unique if starting from any state X, it 
is possible to get to any another state X' after a fi- 
nite number of transitions. We call such Markov chain 
ergodic (regular, or irreducible). 



B. Detailed balance 

How to choose W to get prescribed PI It is suffi- 
cient that W satisfies 

P{X')W(X' —> X) = P(X)W(X — > X'), (5) 

where P is the known distribution. If W satisfies this 
equation and it is ergodic, then the Markov chain will 
generate X according to the probability distribution 
P(X). 



C. Metropolis algorithm 

Metropolis method goes as follows: starting from 
some initial state (configuration) Xq, a sequence of 
states will be generated by a Markov chain through 
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the transition probability W(Xi — » Xi + i). The start- 
ing state may be a unique state or may be generated 
at random with certain distribution Pq. The new state 
X i+ i is based on the old state JQ. The new state is 
generated in such a way so that the probability that 
the system is in state X i+1 is just W(X^ — > X i+1 ), 
knowing that the system was in state Xi. Hopefully 
we can generate this probability distribution easily. 
Otherwise we might have generated P(X) directly in 
the first place. It is sufficient that W satisfies detailed 
balance condition and the Markov chain is ergodic. 
Metropolis algorithm refers to a specific choice of W: 

W(X^X>)=T(X^X')min(l,^ty (6) 

where X ^ X' and T{X -> X') = T(X' -> X). Ma- 
trix T can be any distribution but must be symmetric. 
We almost always use 

2YJf x') — < cons t> f° r X' in a region around X; 
' 1 0, otherwise. 

(7) 

The probability that the state does not change, 
W(X — > X), is determined by the normalization con- 
dition 

Y,W(X^X')=1. (8) 

X' 

II. ISING MODEL FOR MAGNETS 

To be able to understand Swendsen-Wang and other 
cluster algorithms, we need to introduce the very 
popular model for fefxomagnets in condensed matter 
physics. Ising modelou is the simplest model for fer- 
romagncts. One of the striking feature of a magnet is 
that it has spontaneous magnetization at lower tem- 
peratures. However, the magnetization disappears at 
some higher temperature T c . The phase below T c is 
called ferromagnetic phase and that above T c is called 
paramagnetic phase. T c is called critical temperature 
and there is a phase transition at this temperature. 

Let's take two-dimensional system as an example. 
On an L x L square lattice, we put a spin cr, at each 
lattice site i. Each spin takes only two possible values 
+1 and —1. The set of L 2 spins consists of the state 
space X. The system has a total energy as a function 
of the state {a}, defined by 

<i,i> 

where J is called coupling constant. The summation 
is over the nearest neighbors only, i.e., each bond is 
summed once. Such model is also referred as nearest 
neighbor Ising model. When J is positive, the model 
describes a ferromagnet; while J < describes an 
anti-ferromagnet. Let's consider just one term in the 



nearest neighbor interaction. The interaction energy 
of neighbors takes only two values, +J or — J. For 
ferromagnet, the energy is lower if the two spins are 
parallel. It costs energy if the spins are pointing in 
different directions. 

According to statistical-mechanical description, the 
system will not be in a definite state. Due to thermal 
fluctuation, the system will have some probability dis- 
tribution among the states. If the temperature T is a 
fixed parameter, we have the famous Boltzmann dis- 
tribution 

P( M )ocexp(-^Ml), (10) 

where fee ~ 1-38 x 10~ 23 Joule/Kelvin is called the 
Boltzmann constant. Our goal is to calculate average 
quantities such as energy, magnetization, correlation 
functions, etc. The internal energy is defined by 

U=(H)=J2H({a})P({a}); (11) 

M 

and the magnetization is defined as 

w=E|E^| p (w). (12) 

The summation is over all possible states. Since each 
site can have two states (spin up and spin down), we 
have 2 L states for a system of linear size L in d di- 
mensions. 

A Monte Carlo simulation of the Ising model by 
Metropolis algorithm involves the following steps: 

1 . Initialize Oi at each site with arbitrary spin val- 
ues, e.g., all spin up, or up or down with equal 
probability. The complete set of spins {cr} is 
the abstract state X discussed in the previous 
section. 

2. Choose a site k at random and propose to flip 
the spin at that site, a' k = —o^. The proposed 
state X is the one with one spin at location k 
flipped. This amounts to take T(X— >X) equal 
to 1/L d if X and X differ by 1 spin, and oth- 
erwise. 

3. Calculate the energy increment 

AE = H({a'})-H({a}) = 2Ja k £ a.,. (13) 

neighbors 
of k 

4. Accept the proposed state as the new state if a 
uniformly distributed random number (between 
and 1) is less than exp(— AE / ksT); retain the 
old state as the new state otherwise. 

5. go to 2. 

One Monte Carlo step will be defined as performing 
the above basic single-spin flip L d number of times. 
After one Monte Carlo step, each site on average has 
tried once to flip. 
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III. CRITICAL SLOWING DOWN 



IV. CLUSTER MONTE CARLO ALGORITHMS 



In Ising model simulation, or more generally any 
simulation using the Metropolis algorithm, the next 
configuration depends on the previous one. Due to 
this correlation between Monte Carlo steps, the for- 
mula for the statistical error e « a/y/W underesti- 
mates the true error, where a is the variance of the 
quantity of interest, and N is Monte Carlo steps. The 
error formula should be replaced by 



1 + 2r 



N 



(14) 



That is, the error is larger by a factor y/1 + 2r than 
the uncorrelated data. The quantity r is called cor- 
relation time, which is roughly the number of Monte 
Carlo steps needed to generate independent configu- 
rations. We can compute the correlation time r as 
follows: define the time-dependent correlation func- 
tion, 



/(*) 



(A(t')A(t' + t))-(A)< 
<^ 2 > - (A) 2 



(15) 



where time t is measured in terms of Monte Carlo 
steps — time is passed by one unit when one updates 
the system by one Monte Carlo step. And A(t') is the 
quantity at steps t' whose error we want to estimate. 
The angular brackets denote average over Monte Carlo 
steps. The correlation time is defined by 



£/(*)• 



(16) 



Now we are ready to explain the concept of critical 
slowing down. Just like many other quantities near 
the critical point, the correlation time also becomes 
singular at the critical point. On an infinite lattice, it 
diverges with a power law 



t cx IT - Tc 



(17) 



Here v is the correlation length exponent and z is 
called dynamic critical exponent. This phenomena 
that the correlation time becomes very large is called 
critical slowing down. It not only happens in com- 
puter simulation but also appears in real systems. On 
a finite system of linear size T, the correlation time 
will, of course, not go to infinity. But it will grow 
with size as 



r cx L\ T = T c 



(18) 



Substituting this result into the error formula, we find 
that e w <jL z / 2 /N 1 / 2 for large system near T c . For 
the two-dimensional Ising model with Metropolis al- 
gorithm, z « 2.1, we see typically that increasing the 
size leads to a larger error in the quantity to be cal- 
culated. 



The Metropolis algorithm makes changes locally 
one site at a time. This is the cause of the critical 
slowing down. There are algorithms that change the 
states by a groupppf clusters. The advantage of the 
cluster algorithmsa is that they are faster, not nec- 
essarily in computer time per Monte Carlo step, but 
in terms of dynamics. That is, it has a much smaller 
value of z. 



A. Swendsen-Wang algorithm 

The Swendsen-Wang multi-cluster algorithmic is 
closely related to percolations based on the work of 
Fortuin and KasteleynEJ. One starts with a spin config- 
uration {<r} and generates a percolation configuration 
based on the spin configuration by the rule described 
below. Then the old spin configuration is forgotten 
and a new spin configuration {a'} is generated based 
on the percolation configuration. The rule is such that 
the detailed balance is satisfied. Or more generally, 
the transition leaves the equilibrium probability in- 
variant. The algorithm goes as follows: 

1. Start with some arbitrary state {cr}. 

2. Go through each nearest neighbor connection 
of the lattice, create a bond between the two 
neighboring sites i and j with probability p — 
1 — e~ 2J / fc B T i but only if the spins are the same, 
a; = <7j. Never put a bond between the sites if 
the spin values are different. We are creating a 
bond percolation configuration with probability 
p on a subset of lattice sites where the spins are 
either all pointing up or down. 

3. Identify clusters as a set of sites connected by 
bonds, or isolated sites. Two sites are said to be 
in the same cluster if there is a connected path of 
bonds joining them. Every site has to belong to 
one of the clusters. After the clusters are found, 
each cluster is assigned a new Ising spin chosen 
with equal probability between +1 and — 1. The 
old spin values now can be 'forgotten'. And all 
the sites in a cluster now take the value of spin 
assigned to the cluster. 

4. One Monte Carlo step is finished. Repeat 2 for 
the next step. 

The performance of the algorithm in terms of corre- 
lation time in comparison with Metropolis algorithm 
is remarkable. Recall that the dynamic critical expo- 
nent z is about 2 for Metropolis algorithm almost in- 
dependent of the dimensionality. The Swendsen-Wang 
algorithm in one dimension gives z = (r approaches 
a constant as T — > 0). For the two-dimensional Ising 
model, the dynamic critical exponent z is less than 0.3 
or possibly zero (but r oc InT). In three dimensions 
it is about 0.5. At and above four dimensions it is 1. 
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B. Wolff single-cluster algorithm 

In the Swendsen-Wang algorithm, we generated 
many clusters and then flipped these clusters. Wolff 
algorithrr£3 is a variation on the way clusters are 
flipped. One picks a site at random, and then gen- 
erates one single cluster by growing a cluster from the 
seed. The neighbors of the seed site will also belong 
to the cluster if the spins are in parallel and a random 
number is less than p = 1 — e ~ 2J / k BT_ That is, the 
neighboring site will be in the same cluster as the seed 
site with probability p if the spins have the same sign. 
If the spins are different, neighboring site will never 
belong to the cluster. Neighbors of each new site in 
the cluster are tested for membership. This testing 
of membership is performed on pair of sites (forming 
a nearest neighbor bond) not more than once. The 
recursive process will eventually terminate. The spins 
in the cluster are turned over with probability 1. Sin- 
gle cluster algorithm appears more efficient than the 
multi-cluster one. 

The following is a fairly elegant way of implement- 
ing the Wolff algorithm in C. The function is the core 
part which performs a Wolff single cluster flip. This 
function is recursive. The array for spins s [ ] , perco- 
lation probability P, and coordination number Z (the 
number of neighbors) are passed globally. The first 
argument i of the flip function is the site to be 
flipped, the second argument sO is the spin of the clus- 
ter before flipping. The function neighbor returns an 
array of neighbors of the current site. The function 
drand48() returns a random number unformly dis- 
tributed between and 1. 
void flip(int i, int sO) 
{ 

int j , nn [Z] ; 

s [i] = - sO; 
neighbor(i, nn) ; 
for(j = 0; j < Z; ++j) 

if(s0 == s[nn[j]] && drand48() < P) 
f lip(nn[j] , sO) ; 

} 



V. FURTHER DEVELOPMENT OF CLUSTER 
ALGORITHMS 

The attractive feature of Swendsen-Wang algorithm 
stimulated research for other types of cluster algo- 
rithms and its theory. The original method works for 
Ising and Potts models. It is not clear at the first 
sight how one can generalized it for arbitrary model. 
The idea of Ising spin embedding into a model ap- 
pears to work well. This idea is applied to a num- 
ber of models, including models with continuous de- 
gree of freedomlljti3. Kandel and coworkpsj-pjroposed 
a general scheme for cluster algorithmsIj'c-2, which 



leads to efficient-algorithms for models with compet- 
ing interactionai3't3. Cluster algorithm of a different 
kind (loop .algorithm) was proposed to work with ver- 
tex modelsEZl. ._. 

Sokal's idea of introducing auxiliary variableaifl is 
another way of generalizing the cluster algorithms and 
is drawing attention to statisticians!^. However, this 
method seems not work well in comparison with em- 
bedding algorithms when the system has Ising sym- 
metry and embedding can be implemented. 

A large class of problems in statistical mechanics 
can be now simulated with cluster algorithms. Cluster 
algorithms are used routinely where traditionally local 
algorithms are used. Due to space limitations, this 
types of applications will not be reviewed here. 

VI. APPLICATION IN IMAGE PROCESSING 

The Swendsen-Wang algorithm and some gen- 
eralizations-, were used in connection with image 
processing^. Random imagesE3, in particular, a class 
of random fields called Markov random fields, corre- 
spond closely to systems studied in statistical physics. 
Thus the Monte Carlo techniques developed in statis- 
tical mechanics can be readily used here. There is 
also the need to estimate model parameters^], this 
can be done by Monte Carlo simulation. The most 
important application, appears to be in the problems 
of image restorationcj and image segmentatiorHo. 
The statistical method (or Bayesian inference) in im- 
age analysis is relatively new. The basic problem in 
Bayesian image restoration is the following. Let X 
be the true image, which is also statistical in nature 
and distributed according to P(X), called the prior. 
The true scene, X, is not directly observable, and the 
observed data Y contains noise. Given Y, one wants 
to compute the distribution of X , namely, the condi- 
tional probability P(X\Y), called posterior. This is 
obtained by noting 

P(X\Y) oc P(Y\X)P(X), for fixed Y. (19) 

Image restoration amounts to find the most probable 
X by simulating the probability distribution P(X\Y). 
Ising model is assumed for P(X) in some application; 
while P(Y\X) is given as independent Gaussian dis- 
tribution for Y . From the point of view of statistical 
mechanics, P{X\Y) describes a model with inhomo- 
geneous local magnetic field which is determined by 
Y. 

Segmentation is a process of classifying each pixel 
in an image. We can think of it as a special case of 
restoration problem. Thus the two problems share the 
same mathematical structure. The problem of speed 
of convergence in image segmentation- by Swendsen- 
Wang algorithm is studied inc3 andE-1 The cluster 
algorithms work best when the system has a high de- 
gree of symmetry. The two types of problems, image 
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restoration and image segmentation, all end up in- 
volving the simulation of a model with the presence 
of complicated magnetic field, which makes cluster al- 
gorithms less effective. The cluster algorithms and 
other acceleration algorithms, e.g., ref.o, developed 
in statistical physics may be very helpful in this field. 
But much more work needs to be done. 
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